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Abstract 

Let p and q be probability vectors with the same entropy h. Denote 
by B{p) the Bernouhi shift indexed by Z with marginal distribution p. 
Suppose that (/9 is a measure preserving homomorphism from B{p) to 
B{q). We prove that if the coding length of ip has a finite 1/2 moment, 
then c7p = (Tg, where dp = logpi — h)'^ is the informational 

variance of p. In this result, the 1/2 moment cannot be replaced 
by a lower moment. On the other hand, for any 9 < 1, we exhibit 
probability vectors p and q that are not permutations of each other, 
such that there exists a finitary isomorphism $ from B{p) to B(q) 
where the coding lengths of $ and of its inverse have a finite 6 moment. 
We also present an extension to ergodic Markov chains. 

1 Introduction 

Let A = {ao, . . . , aa-i} be a finite alphabet and p = {po, . . . ,Pa-i) a prob- 
ability vector with entropy h{p) = ^"jj — pjlog(pj). Consider the Bernoulli 
shift B{p) = {X,A,P,T), where X = A^ is equipped with the prod- 
uct o"-algebra A, the product measure P = p^ and the left shift T. Let 
B = {Po, . . . , Pb-i} be another finite alphabet, and q = (go, • • • , Qb-i) a prob- 
ability vector; denote by B{q) = (F, i3, Q,T) the corresponding Bernoulli 
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shift. A homomorphism Lp from B{p) to B{q) is a measurable map from 
X to y, defined P-a.e., such that Py?^^ = Q and (pT = Tip P-a.e.. An iso- 
morphism is an invertible homomorphism. A homomorphism ip from B{p) 
to B{q) is finitary if there exists a set C X with P(iy) = 1, that has the 
following property: for all x G there exists n = n{x) such that if x & W 
and Xi = Xi for all —n < i < n, then {ip{x))o = {ip(x))o. We write N^{x) 
for the minimal such n, and call N^^^x) the coding length of (p. A finitary 
isomorphism is an invertible finitary homomorphism whose inverse is also 
finitary. 

By the Kolmogorov-Sinai Theorem (see, e.g., [^), if B{j)) and B{q) are 
isomorphic, then h{p) = h{q). The converse was established by Ornstein [U]. 
Keane and Smorodinsky proved that if h{p) = h{q), then there exists a 
finitary isomorphism from B{p) to B{q). Parry jH] and Schmidt showed 
that if a finitary isomorphism from B{p) to B[q) has finite expected coding 
length in both directions, then p and q must be permutations of each other. 

In this paper, we prove that the informational variance of p, 

a—l 2 

(^l = ^Pi(^-^og{pi) - h{p)^ 

i=0 

is an invariant of isomorphisms ip that satisfy E^A^i^''^^ < oo. More precisely: 

Theorem 1 Let p and q be probability vectors that satisfy h{p) = h{q) and 
7^ 0"^. Then there exists a constant Cp^g > such that for any finitary 
homomorphism (p from B{p) to B{q), we have 

lim inf — ^ — ^= — - > €„„ 

In 



and consequently, E^A^i^^^j = cxo. 

(Here and throughout, E denotes expectation with respect to P = p^.) 
The exponent 1/2 in the theorem is sharp, since Meshalkin [Hj (see §3) 

constructed a finitary isomorphism Lp from B{j)) for p = |, |, |, |j to 

B{q) for q = where P[N^ > k] equals the probability that a 

simple random walk remains positive for k steps. Thus for Meshalkin's code. 
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< MuikPiN^ > k]Vk < oo, whence e(^A^J) < oo for all 6 < 1/2. Clearly 

cTg < cTp in this case, so Meshalkin's code is essentially optimal. 

The assumption that ^ in Theorem ^cannot be dropped, as shown 
by our next result. 

Theorem 2 For any < 9 < 1, there are probability vectors p and q where 
p is not a permutation of q, such that there exists a finitary isomorphism $ 
from B{p) to B{q) that satisfies E(A^|) < oo and Eq(A^^_i) < oo. 

Theorem ^ is proved in the next section. In §3 we recall Meshalkin's 
isomorphism, and describe an adaptation of Meshalkin's code which moti- 
vates Theorem 121 In §4 we define a class of matchings useful for the proof of 
Theorem 121 and in §5 we prove the theorem. In §6 we define informational 
variance for ergodic Markov chains, and present an extension of Theorem 
to this setting. 

2 Proof of Theorem 1 

With the notation of the introduction in force, we may assume that the 
probability vectors p and q satisfy pj > for all < z < a and qj > for 
all < j < 6. Let ^9 be a finitary homomorphism from B{p) to B{q). For 
X = {xk)k& e X, write Xi{x) = - log{p{xi)) - h{p), where p{aj) = pj for 
any j. Similarly, if (p{x) = y = {yk)kez e Y, let Yi{x) = -log(g(?/i)) - h{q). 
Since ^P-^ = Q, it follows that E{Xi) = EiYi) = 0. Let 5„ = Yl7=i and 
Rn = Er=i ^i- Write t+ = max{t, 0}. 



Lemma 3 If a^, then 



1 „ ^ . \o„ — a. 



liminf — E(i?„ - > 



Proof. By a version of the central limit theorem (see P3|, Cor. 2.1.9), 

'Rt\ (y„ 



lim E^ = \ te-- dt 



n-^oo \y/ny V27T Jo v27r 

and similarly 



lim E 

n— »oo 



(SI 
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Since {Rn — Sn)~^ > — S^, we infer that 



liminf ^E(i?„ - SX > (1) 

n^oo y/n V27r 



and similarly 

liminf ^E(S„ - > . (2) 



If (Tg > (Tp, then (d} proves the lemma. In the remaining case, cXp > cXg, 
the assertion of the lemma follows from Q by taking expectations in the 
identity 

{Rn Sn)'^ = {Rn Sn) + (5*^ Rn)~^ ■ 

□ 

Lemma 4 Let (p be a finitary homomorphism from B{p) to B{q). Denote 
Xq = max{— log(gj) : 0<j<6 — 1}. Then for all n, 

E(i?„ - SnY < 2\qE{N^ A n) . 

Proof. Let 

In = In{x) = |z G {1, . . . , ra} : N^{Tx) > mm{i, + 1 - i}| 
and denote J„ = {1, . . . , n} \ /„. Observe that 

n n 

E\In\ = 5^ P(2 G /„) < 2 ^ P{N^ >t)< 2E{N^ A n) . (3) 

i=l i=l 

Fix X E X and let y = (p{x). Since 
|x e X : (xi, . . . , x„) = (xi, . . . , x„) I C v?"^ |y e y : y,- = Vj G J„| , 
it follows that 

Pjx e X : (xi, ...,Xn) = (xi, . . . , x„)| < Qjy eY -.yj = y^ Vj G J„| . 
Taking logarithms, this implies that 

n n n 

^logp(xfc) < ^logg(?/fc) -^logqivi) < ^logq{yk) + Xq\In\ ■ 

k=l k=l iein k=l 
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Since h{p) = h{q), we deduce from the last equation and the definitions 
of Rn and S'„ that -R„ — 5'„ < Ag|/„|, whence by (0), 



□ 



Proof of Theorem [H Lemmas El and m imply that 

liminf ^ ^ — - > ^ — ^ > , (4) 

so it only remains to verify the final assertion of the theorem. 
Observe that N^p An < ^yN^n and (A^;^ A n)/y/n — > P-a.e. 

If we had E^y^iV^j < oo, then we could deduce by dominated convergence 
that E{N^ A n)/y/n 0, which contradicts 0. Thus E(^./A^j = oo. □ 

A similar idea was used in a different context by Liggett [3]. 



3 Motivating examples and heuristics 

Meshalkin's coding 

First, we briefly recall the Meshalkin isomorphism jH]. Let B{r) be the 
BernouUi shift on the alphabet Ai = {ai, . . . , as} for r = (I? I? I? |) |) 
and let B{s) be the Bernoulli shift on the alphabet Bi = . . . ,/34} for 
s = We represent the symbols of Ai as 

«! = , 02 = 1 , as = 1 , ^4 = 1 , as = 1 , 
11 

10 1 

The symbols of Bi are represented as: 

A = , P2 = 0, P3 = l, /34 = 1 , 

The Meshalkin flnitary isomorphism from B{r) to B{s) can be described 
in two equivalent ways. Given a sequence x = {xj)j^z G Af, denote by ii 
the length of the binary representation of Xi G Ai. The random walk 
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description of is obtained by defining, for eacli i witli £j = 1, 

m 

m{i) = minjm > i : ^{ii - 2) = o| . (5) 

j=i 

Observe tliat m(-) is an injective map from {i G Z : £j = 1} onto {j G Z : 
ij = 3}. For eacli z G Z witli ii = 1, remove tlie bottom bit from Xm{i) and 
append it at the bottom of Xj. Tliis produces two symbols from Bi that are 
denoted ym(i) and yi, respectively. Set ip{x) = y = {yj)jez- 

Alternatively, we have an equivalent inductive construction of ip: 
Step 1: For each i G Z such that ii = I and = 3, send the bottom 
bit of Xj+i below Xj, output the resulting Bi symbols and remove from 
consideration both i and i + 1. 

For each n > 2, perform: 

Step n: For alH G Z such that ii = 1, = 3 and i,i + n have not been 
removed from consideration, send the bottom bit of below Xi, output 
the corresponding Bi symbols and remove from consideration both i and 
i + n. 

An adaptation of Meshalkin's coding 

Next we describe informally a variant of the coding above, which we will 
generalize in §5 to prove Theorem |21 Consider the random walk where each 
increment Xi has P(Xj = 1) = P(Xj = 3) = |. The moment generating 
function is 

r(.) = E(.^o = 

Consider also the walk where each increment Yi equals 2 with probability 1. 
This has moment generating function 

These walks count the accumulated information for the Bernoulli shifts B{r) 
and B{s), where r = (|, |, |, |, |) and s = (|, |, |, |) as in Meshalkin's cod- 
ing. The entropy equality h{r) = h{s) corresponds to the identity r'(l) = 
A'(l) while the inequality of informational variance corresponds to the in- 
equality r"(l) ^ A"(l). The identity 

T\z)-A\z) = 1{t[z^)-A{z^)) 
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underlies the construction below. We add markers and f3o, respectively, to 
the alphabets Ai and Bi described above. Let B{p) be the Bernoulli shift on 
the alphabet A = {ao, . . . , a^} = {ao} U Ai, with associated probability vec- 
tor p = (i, J, j^, jq, jq, Jo). Let B{q) be the Bernoulh shift on the alphabet 
B = {Po, . . . , P4} with associated probability vector q = (|, |, |, |, |) . 

Next we construct $, a finitary isomorphism from B{p) to B[q): 
Step 0: If Xi = ao, let ($(a;))j = Pq] that is, send markers to markers. 
Step 1: Match the non- marker locations in pairs. Suppose that i is paired 
with j. If ii 7^ ij, we can assume that ii = 1 and £j = 3 (otherwise reverse 
the roles). Remove the bottom bit of xj and append it below Xi, output the 
resulting B symbols, and remove from consideration both i and j. If 
ii = ij, then do not remove i and j from consideration. 

For each n > 2, perform: 

Step n: The locations which we have not removed from consideration are 
grouped in 2"~^- tuples. Each such 2"~^-tuple is either of type 3 (which we 
define to mean that for every location i within the tuple ii = 3), or of type 
1. Using the markers, match the 2"~ ^-tuples which have not been removed 
from consideration in pairs to form 2"-tuples. If a 2"~^-tuple ^3 of type 3 is 
matched with a 2"~^-tuple ^1 of type 1, remove the bottom bit from each Xi 
in ^3, and append it to the corresponding symbol in ^1. Finally, output the 
symbols of B thus generated, and remove these locations from consideration. 

The coding length for the isomorphism described above has essentially 
the same tails as Meshalkin's. To explain this, observe that the probability 
Fk that a symbol at the origin is not coded during the first k pairing stages 
is approximately 2~*^ (the approximation is due to parity problems caused by 
markers.) After the k^^ pairing stage, only about 1/2'^ of the symbols remain 
uncoded, and these symbols are grouped into 2''-tuples. Thus heuristically, 
the event F^, corresponds to an expected coding distance of order 4^. This 
suggests that P(A^$ > t) . Indeed, for this example. Theorem Q implies 

that E(iV^'^^) = 00 and the proof of Theorem |21 will show that E(A^|) < 00 
for all e < 1/2. 

An example with 3/4 — e moments: heuristics. Consider different 
probability vectors p and q, chosen so that the random walks counting the 
accumulated information of non-marker symbols have moment generating 
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functions 



and 




respectively. Then 

This example is the case n = 2 of the sequence of examples analyzed in §5; 
see (Uni) and (HH). 

Define a finitary coding $ from B{p) to B{q) by adapting the recipe above 
(see §4 and §5 for details). To estimate the tails of iV$, start by observing 
that the probability that a symbol is not coded during the first k pairing 
stages is about 8~^. At that stage, symbols are grouped into 2'^-tuples, and 
only 1/8^^ of them remain uncoded, so heuristically, this event corresponds 
to an expected coding distance of order 16^. This suggests that 

P{N^ > t) ^t-i. 

Indeed, for this example we will show in §5 that E(A^|) < oo for all < 3/4. 
This is consistent with Theorem ^ since the identities r'(l) = A'(l) and 
r"(l) = A"(l) indicate that p and q have the same entropy and the same 
informational variance. 



4 Ordered measure preserving matchings 

In this section, we define a type of matching which we will employ in our 
constructions in §5, and derive some useful properties of these matchings. 
Let C = {71, . . . , 7c} and D = . . . , 5d} be finite alphabets, and let r = 
(r(7i), . . . , 'r(7c)) and s = . . . , s{Sd)) be probability vectors. Let 

ri = r{k,C,r) = Zc{rh^) ■■rh^) = 2-'} 
and 

A^, = A(A;, D, s) = Et,M5,) : = 2"^. 
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Define an order relation ^ on C sucli that 7i ^ • ■ ■ ^ 7c and an order 
relation ^ on D such that di -< ■ ■ ■ ^ d^- Endow C x C with the lexicographic 
ordering, i.e., define 7^7^- -< ^raln if 7i < Im or if 7^ = 7^ and 7^ -< 7„. 
Similarly, endow D x D with the lexicographic ordering -<. 

Let r{'yi'-fj) = r{'^i)r{^j) and s{5i5j) = s{Si)s{Sj). We define the maximal 
ordered measure preserving matching (mompm) — V'(c,D,r,s) from 
C X C to D X D given (r, s) as follows: 

For all t G M, write the ordered set G C x C : r(,x) = t} in increasing 
order as {xt{i) '■ I <i < it}, and similarly, write the ordered set {|/ G D x D : 
s{y) = t} in increasing order as {yt{i) : 1 < « < "^t}) assuming these sets are 
non-empty. Define 'ip{xt{i)) — yt{i) for 1 < i < mm{it,mt}. 

Let E = E(C,Y),r, s) be the set in C x C where ij) is defined. Let 
F = F(C,D, r,s) = iIj{E). Let G = G(C,D, r,s) ^ C x C - E. Let 
H = H{C, D, r, s) = D X D - F. 

Let r = r(c.D,r.s) = {r{x) : x G G), where r{x) = and s = 

S(c,D,r,s) = -.yeH), where s{y) = j2~^^\{y) probability vec- 

tors induced by (r, s) on G and H. 
Let 

T* = T(fc,C,r) = E.eCxc^W ■ K^) = 2-^, 
1]*, = n{k, D, = E,6Dxd{s(|/) : ^(1/) = 2-n, 
Al = A{k, C, D, r, s) = E.eG{K^) : r{x) = 2"^}, 
and let 

El = S(^, C, D, r, = E,e/^{«(?/) ^ 'iv) = 2-^- 

We say that ip reduces mass by a factor of t if EfeLo ^fc ~ ^• 
Let 

00 

r(z)^r(C,B,r,s,z)^J2^lz\ (8) 

A;=0 

Define A{z), T{z), n{z), A{z), and E{z) analogously. Then T(^) = r^{z) 
and n{z) = A2(z). Also, A(^) - E{z) = T{z) - n{z). 

Lemma 5 Suppose r^{z) - A'^{z) = tr(z'^) - tA{z^). Then: 
(i) A{z) = tT{z^) and E{z) = tA{z^) 
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(a) ip reduces mass by a factor oft. 
Proof. 

(i) A(^) - E{z) = T{z) - n{z) = T\z) - A^{z) = tT{z^) - tA{z^), 
hence 

A{z) = tT{z') (9) 

and 

E{z)=tA{z'). (10) 

(ii) By ©, 

oo oo 
fc=0 k=0 

□ 

Let Ci = C, let Di = D, let vi = r, and let si = s. Inductively, 
let Ci+i = G{Ci,Di,ri,Si), let A+i = H{Ci, Di,ri, Si), let r^+i = r{c„D„n,s,), 
and let Sj+i = S(^c\,D„r„s,)- Let = V'lCj, A, '^i, Sj). Note that ipi matches 2'- 
tuples to 2'-tuples. We call {ipi}i>i the sequence of mompm's associated 
to (C, D,r, s). Let Ti{z) = T{Ci, Di,ri, Si, z). In particular, Ti{z) = T{z) 
as defined in equation (jHl). Define Aj(z), Tj(2;), fij(z), Aj(z), and 
analogously. 

Inductive apphcation of Lemma El gives: 

Corollary 6 Suppose V^iz) - A^{z) = tTi{z^) - tAi{z^). 

(i) If I e Z+, then r2(2) - Af{z) = tTi{z^) - tAi{z^) 
(a) If i G Z+, then ipi reduces mass by a factor of t. 
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5 A class of codes with finite moments 

Finally, we construct a class of examples to prove Theorem El 
Fix n G Z+. 

Let = = \- Construct p = . . . such that for each 

integer m G [0,^], exactly S^^^^) of the Pi take the value 2"^™"^". Thus if 
for 2 > 1, we denote r(aj) = 2pi, then 

r = ■■ -("^) = 2-(— -)} = ^ Ql) (11) 

for all m & Z such that < m < n. Define F^. = for all other k. 

Similarly, for j > 1, denote s{Pj) = 2qj and construct q = (i, gi, . . . , qb-i) 
such that 

AW.„ = Ef-lft) ■ = = ^ (2, 1) (12) 

for all m & Z such that < m < n — 1, and define = for other k. 

Let B{p) be the Bernoulli shift with probability vector p on the alphabet 
A = {ao, . . . , tta-i}. Let -B(g) be the Bernoulli shift with probability vector 
q on the alphabet B = {Pq, . . . , Pb-i}- 

Let C = {«!, . . . , aa-i} and let D = {Pi, . . . , Consider the proba- 

bility vectors r = (r(aj) : 1 < z < a — 1) and let s = (s(/5j) : 1 < j < & — 1). 
Relative to these, define all other terms as in §4. 

Lemma 7 If i E Z+, then tpi reduces mass by a factor of ■ 

Proof. Recall that V{z) = ^^=0^^^'' and A{z) = Y.k=Q^kZ^- % the 
binomial theorem and equations (fTT|) and (fT^ . we find that 

T\z)-A\z) = {T{z)-A{z)mz)+A{z)) 



22n-l V 2 



22n-l 

SO the desired result holds by Corollary El □ 
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Example 

When n = 2, we may let A = {oq, . . . , 041}; B = {Pq, . . . , /?4o}; p = 
(Po, ■••,P4i) such that po = 2~\ pi = 2"^, p2 = ■ ■ ■ = P25 = 2"^, and 
P26 = ■ ■ ■ = P41 = 2"^; and q = (go, • • • , g4o) such that go = 2"\ gi = • • ■ = 
gs = 2"^ and gg = ■ ■ ■ = g4o = 2""^. 

Taking logarithms to base 2, /i(p) = h{q) = | and dp = cr^ = ^, hence 
Theorem n does not apply. These vectors correspond to the generating func- 
tions in © and dZj). We find that 

r3,n,r;) = (HI) (13) 



8' 4' 8 

a:.a;) = (1.1) (14) 

(t;, t;, t;„ t;^, t;,) = (-,-,—,-,-) (15) 

f!',fi;„,f!tj) = (i,i,i) (16) 

Ae.A:o.A:.) = (s-i-s) 

--)-(^-I5)- 
Definition of $ 

For X = {xk)k& € X = A^, define a j-marker as a run of at least 
2nj consecutive symbols. Define a j-gap as the location of the non-ao 
symbols between neighboring j-markers. 

Let Gj,o = {9U1 0, 1), • • • , 9(3^ 0, ^j,o)} be the ordered elements (from left 
to right) of the j-gap containing min{z > : Xi ^ uq}. More generally, let 
Gj^i = {g{j, . . . , g{j, i, ^ja)} be the ordered elements of the z*'^ j-gap to 
the right of Gj^ (to the left if z < 0). 

Step 0: If Xi = ao, let ($(x))j = /Jo- 
Step 1: Within each 1-gap, match the elements in pairs, starting from 

the left {g{l, i, 1) with g{l, i, 2), g{l, i, 3) with g{l, i, 4), etc.). All the elements 

will be paired except possibly g{l,i,ii^i). 

If 'ilJi{xg(i^i^2k+i)Xg{i,i,2k+2)) is defined, then let 
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and remove from consideration g{l, i, 2k + 1) and g{l, i, 2k + 2). 

Starting from the left, match the pairs which have not been removed from 
consideration into quartets. If ip2 of the symbols at the position of a quartet 
is defined, output the result in the position of the quartet and remove the 
elements of the quartet from consideration. 

Iterate, matching 2'^~^-tuples which have not been removed from consid- 
eration into 2*'-tuples and applying ipk, until 2^ > ii^i. 

For each j > 2, do the following: 

Step j: Within each j-gap, starting from the left, match into pairs any 
elements in Gj^i which were not paired in any of the previous steps, and apply 
ipi as in Step 1. 

Match into quartets any previously unmatched pairs (including the pairs 
just created) which have not been removed from consideration, and apply 
ip2, etc., iterating until 2^^ > £j j. 

When n = 1, this is the code described in §3. The next two lemmas are 
needed as preparation for bounding the tails of N<s,. 

Lemma 8 If f{x) = q^li^o^ao], then E(/) = 2~'^^-\ 

Proof. The sum Ylm=i fi^^x) differs from the number of j-gaps in 
[1, M] by at most 2. Counting j-gaps in [1, M] is equivalent to counting 
runs of 2nj marker symbols followed by a non-marker symbol; such strings 
have asymptotic frequency 2^^"-^"^. Taking the limit of Ylim=i /(^™^) 
M — >■ cxo, the ergodic theorem yields the assertion. □ 

Lemma 9 For j > 1 Let Lj i = 2nj + g{j,i,ij i) — g{j,i,l) denote the "span" 
of the i*^ j-gap. If 9 < 1, then 

E((L„o-Vi,or |xo^«o) <2(2+^"^■)^ 

Proof. The expected distance between the beginnings of successive j-gaps 
is 2^+^"-' by Kac's Theorem (see jTUI, p. 46), whence 

e(l,-o- Vi,o |a;o^ao) < 2^+2-^ . 
The assertion of the lemma follows by Jensen's inequality. □ 
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Lemma 10 Ife<l-^, then E{N^{x)y < oo. 

Proof. Recall Lj^i from the previous lemma and define Lo,i = 0. If Step j 
determines ($(a:))o, then A^$(x) < Lj^q. Let Aj be the event that Steps 1 to 
j do not determine ($(a:))o. 

Let Bj be the event that the 0*^ coordinate is matched at least j times 
by the end of Step j, but ($(x))o has not yet been determined. Let Cj be 
the event that at the end of Step j, the O*'' coordinate has been matched at 
most j — 1 times (so it is not part of a 2-' -tuple). Clearly, for each j > 1, 

P(A,)<P(i?,) + P(C,). (19) 

Every time an undetermined coordinate is matched, the probability that 
it remains undetermined is ^ai-i , whence 

PiB,)<{^y. (20) 

Since, for all k and j, at most one 2'^-tuple in Gj^ is unmatched at the 
end of Step j, it follows that 

P(C,)<E(£|^|.„^.„)<2i(^)'^(5i^)' 
by Lemma IHl Thus 

Therefore 

< 



Conditional on the event that xq ^ ao, the random variable {Lj^ — Lj^ifl) 
is independent of the event Aj-i, hence by Lemma IHl 

oo 1 i 

E(iV^(x))^ < ^2(^)'E((L,-o-Vi,o)'|xo^ao) 



^P(A,_OE(4o-i.?-i,oM.-i) 

f:2(^)'E((L,o-L,_,o)''M,_i) 
,=1 V / 
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□ 

A similar argument gives: 
Lemma 11 If 6 < 1 - ^, then EQ(iV$-i(x))^ < oo. 

Proof of Theorem [21 By Lemmas ITUl and ITTl it only remains to verify 
that $ is an isomorphism. Since $ is finitary, it gives an a.e. defined map 
from B{p) to B{q). As our definition of depends only on the position 

of i within its j-blocks, $ is translation invariant. Since each ipk is a one- 
to-one measure preserving matching from previously uncoded sequences to 
previously uncoded sequences, it follows that $ is measure preserving and 
invertible. More precisely, for P G X and any n > 1, all the symbols 

in the string . . . get coded within a finite distance. This means 

that the cylinder set |x G X : (x_„, . . . , x„) = (x_„, . . . , is partitioned 

into countably many cylinder sets Cj (and a set of measure zero); each Cj is 
mapped, using one of our matchings i'kij), to a cylinder set in Y with 

Q($(Cj)) = P{Cj). This completes the proof. 

□ 

6 Extension to ergodic Markov chains 

Let A = {ao, . . . , aa-i} be a finite alphabet and let p = {p{ai, C(j))o<i,j<a-i 
be an irreducible stochastic matrix. The associated Markov chain M{p) 
is ergodic and has a (strictly positive) unique stationary distribution p = 
(p(ao), . . . ,p{aa-i)). Similarly, let B = {Pq, . . . , Pt-i} be a finite alphabet 
and let q = {q{Pi, Pj))o<ij<b-i be a stochastic matrix such that M(g) is 
ergodic with unique stationary distribution q = (g(/9o), • • • , ^(/?b-i))- The 
Markov chain M{p) has entropy 

h{p) = -p{ai)p{ai, a j) log p{ai, a j) . 

0<i,j<a-l 
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We will assume that h{p) = h{q). Let (f he a finitary homomorphism from 
M{p) to M{q). For x = ixk)kei. ^ A^, let Xi{x) = - log(p(xi_i, x^)) - h{p). 
Similarly, if ip{x) = y = {yk)k& e B^, let Yi{x) = - log(g(?/i_i, - h{q). 
Let Sm,n = Y17=m+i ^^'^ ^^t Rm,n = 'EH=m+i ^i- Since ip IS measure 
preserving, it follows that E(Xj) = E(Fj) = 0. Let 

Xp = max {- log(p(aj, aj)) : p{ai, aj) 7^ 0} , 

0<i j'<a— 1 

and let 7p = maxo<i<a-i{- log(p(ai))}. 

The following central limit theorem can be found, e.g., in pp, p. 422 under 
an additional aperiodicity assumption, and in in much greater generality. 
For the reader's convenience, we include a brief proof. 

Lemma 12 // M{p) is an ergodic Markov chain on a finite alphabet, then 
there exists a constant (Xp > depending only on p such that =^ X'^p 
law, where x denotes a standard normal variable. 

We define 0"^ to be the asymptotic informational variance of p. 
Proof. For any x e A^, let Tq = min{t > : Xt = ao}. Inductively, for 
i > 0, let Tj+i = min{t > Ti : Xt = ao}- The increments Tj — Tj_i are i.i.d. 
and have exponential tails. The partial sums {STi_i,Ti}i>i are also i.i.d. Let 
dp = E(Ti — To) > 0. By an application of the ergodic theorem and the law 
of large numbers, E(S'to,Ti) = 0. Since |Xj| < Xp, it follows that S^^j,^ < 
(Ti - TofXl, whence e(s^^^t^) = 4 < 00. Let Nr^ = min{m > : T„ > n}. 

Since 1 in probability, the random index central limit theorem (see 

pp, p. 116) states that 



Sto,Tn„ _^ CpX 



n 



(21) 



Define cXp = j-. Since E(T/v„ — n) < maxo<j<a-i E(To | Xq = ««) for all 
n e Z+, it follows that 

E(|So,„ - ^To,T^.„ I) < ApE(To) + Xp max E(To | xq = a^). 

0<i<a— 1 

In conjunction with (PT|). this gives =^ x^^p- ^ 

Let Jn = {i G {1, . . . , n} : n^{T'^x) > mm{i,n + 1 — z} or n^(T*~^x) > 
minji — 1, n + 2 — i}}. Let /„ = {1, . . . , n} — J„. 

As in §2, we deduce from the CLT and uniform integrability: 
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Lemma 13 If ^ then liminf„^„o -^HRo,n - ^o,n)+ > 

The proofs of Lemma 0] and Theorem ^ adapt to prove the following. 

Lemma 14 Suppose M{p) and M{q) are ergodic Markov chains and if is a 
finitary homomorphism from M{p) to M{q), Then for all n, 

E(i?o,n - 5o,n)+ <lp + 4A,(E(iV^ A n) + 1) 

Theorem 15 LetM{p) andM{q) be ergodic Markov chains such that h{p) = 
h{q) and a"^ ^ a^. If if is a finitary homomorphism from M{p) to M{q), then 

E (^^^N^{x)^ = oo. More precisely, liminf„^oo :^El(A^(^(x) An) > Cp^g > 0. 



7 Higher moments: a problem 

Theorem and our constructions in §5 suggest the following: 

Question. Let p and q be probability vectors with h{p) = h{q). Fix an 

integer k > 2. Suppose that 99 is a finitary homomorphism from B{p) to 

B{q), that satisfies E^A^<^ ^^''j < 00. Does it follow that 

J^Pii^ogpi)" = J2Qji^ogqjf ? 



Acknowledgment 

We thank Alexander Holroyd for helpful discussions, Serban Nacu for com- 
ments and Ben-Zion Rubshtein and Jeff Steif for references. 



References 

[1] R. Durrett (1996), Probability: Theory and Examples, Second Edition. 
Duxbury Press, New York. 

[2] M. I. Gordin and B. A. Lifsic (1978), The central limit theorem for 
stationary Markov processes. Soviet Math. Dokl. 19, 392-394. 

[3] M. Keane and M. Smorodinsky (1979), Bernoulli schemes of the same 
entropy are finitarily isomorphic. Annals of Math. 109 (1979), 397-406. 



17 



[4] T. Liggett (2002), Tagged particle distributions or how to choose a 
head at random. In and out of equilibrium (Mambucaba, 2000), Progr. 
Probab. 51, Birkhauser Boston, 133-162. 

[5] L. D. Meshalkin (1959), A case of isomorphism of BernouUi schemes. 
Dokl. Akad. Nauk SSSR, 128, 41-44. 

[6] D. S. Ornstein (1970), BernouUi shifts of the same entropy are isomor- 
phic. Adv. in Math. 4, 337-352. 

[7] W. Parry (1979), Finitary isomorphisms with finite expected code 
lengths. Bull. London Math. Soc. 11, 170-176. 

[8] W. Parry (1979), An information obstruction to finite expected coding 
length. Ergodic theory (Proc. Conf., Math. Forschungsinst., Oberwol- 
fach, 1978), pp. 163-168, Lecture Notes in Math. 729, Springer, Berhn. 

[9] W. Parry and K. Schmidt (1984), Invariants of finitary isomorphisms 
with finite expected code-lengths. Conference in modern analysis and 
probability (New Haven, Conn., 1982), 301-307, Contemp. Math. 26, 
Amer. Math. Soc, Providence, RI. 

[10] K. Petersen (1983), Ergodic Theory. Cambridge University Press. 

[11] K. Schmidt (1984), Invariants for finitary isomorphisms with finite ex- 
pected code lengths. Invent. Math. 76, 33-40. 

[12] D. W. Stroock (1993), Probability theory, an analytic view. Cambridge 
University Press. 



18 



